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in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in 
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web 
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Foreword 

This Technical Specification (TS) has been produced by ETSI Technical Committee Speech and multimedia 
Transmission Quality (STQ). 

The present document is to be used in conjunction with the ETSI standard series EG 202 396 [i.2] to [i.4]: 

Part 1: "Background noise simulation technique and background noise database"; 

Part 2: "Background noise transmission - Network simulation - Subjective test database and results"; 

Part 3: "Background noise transmission - Objective test methods". 

The present document is based on the objective test method described in EG 202 396-3 [i.4] and contains modifications 
of the model required in order to provide a good prediction of the uplink speech quality in the presence of background 
noise of modern mobile terminals. 
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Scope 



The present document describes testing methodologies which can be used to objectively evaluate the performance of 
narrowband and wideband mobile terminals for speech communication in the presence of background noise. 

Background noise is a problem in mostly all situations and conditions and needs to be taken into account in both, 
terminals and networks. The present document provides information about the testing methods applicable to objectively 
evaluate the speech quality of mobile terminals with AMR and AMR-WB codecs in the presence of background noise. 
The present document includes: 

• The method which is applicable to objectively determine the different parameters influencing the speech 
quality in the presence of background noise taking into account: 

the speech quality; 

the background noise transmission quaUty; 

the overall quality. 

• The description of the adaptation of the test method described in EG 202 396-1 [i.2]. 

• The model results in comparison with the underlying subjective tests used for the retraining of the objective 
model. 

• The model validation results. 

The present document is to be used in conjunction with: 

EG 202 396-1 [i.2] which describes a recording and reproduction setup for realistic simulation of background 
noise scenarios in lab-type environments for the performance evaluation of terminals and communication 
systems. 

EG 202 396-2 [i.3] which describes the simulation of network impairments and how to simulate realistic 
transmission network scenarios and which contains the methodology and results of the subjective scoring for 
the data forming the basis of the present document. 

EG 202 396-3 [i.4] which describes the basic objective model underlying to the Model described in the present 
document. 

American English speech sentences as enclosed in the present document. 



2 References 

References are either specific (identified by date of publication and/or edition number or version number) or 
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the 
referenced document (including any amendments) applies. 

Referenced documents which are not found to be publicly available in the expected location might be found at 
http://docbox.etsi.org/Reference . 

NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee 
their long term validity. 

2.1 Normative references 

The following referenced documents are necessary for the application of the present document. 
Not applicable. 
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2.2 Informative references 



The following referenced documents are not necessary for the application of the present document but they assist the 
user with regard to a particular subject area. 

[i.l] 3GPP S4-120542: "Common subjective testing framework for training of P. 835 test predictors". 

[i.2] ETSI EG 202 396-1: "Speech and multimedia Transmission Quality (STQ); Speech quality 

performance in the presence of background noise; Part 1: Background noise simulation technique 
and background noise database". 

[i.3] ETSI EG 202 396-2: "Speech Processing, Transmission and Quality Aspects (STQ); Speech 

Quality performance in the presence of background noise; Part 2: Background Noise Transmission 
- Network Simulation - Subjective Test Database and Results". 

[i.4] ETSI EG 202 396-3: "Speech and multimedia Transmission QuaHty (STQ); Speech Quality 

performance in the presence of background noise Part 3: Background noise transmission - 
Objective test methods". 

[i.5] ETSI TS 126 073: "Digital cellular telecommunications system (Phase 2+); Universal Mobile 

Telecommunications System (UMTS); LTE; ANSI C code for the Adaptive Multi Rate (AMR) 
speech codec (3GPP TS 26.073)". 

[i.6] Recommendation ITU-T P. 835: "Subjective test methodology for evaluating speech 

communication systems that include noise suppression algorithm". 

[i.7] Recommendation ITU-T G.722.2: "Wideband coding of speech at around 16 kbit/s using Adaptive 

Multi-Rate Wideband (AMR-WB)". 

[i.8] Recommendation ITU-T P. 56: "Objective measurement of active speech level". 

[i.9] Recommendation ITU-T P. 1401: "Methods, metrics and procedures for statistical evaluation, 

qualifying and comparison of objective quality prediction models". 

[i.lO] Recommendation ITU-T G.160 Appendix II, Amendment 2: "Voice enhancement devices: 

Revised Appendix II - Objective measures for the characterization of the basic functioning of 
noise reduction algorithms". 

[i.ll] Recommendation ITU-T G.191: "Software tools for speech and audio coding standardization". 

[i.l2] Hastie, T.; Tibshirani, R.; Friedman, J.: "The Elements of Statistical Learning: Data Mining, 

Inference, and Prediction", New York: Springer- Verlag, 2001. 

[i.l3] Recommendation ITU-T P. 501: "Test Signals for Use in Telephonometry" . 

[i.l4] Recommendation ITU-T P. 58: "Head and Torso simulator for telephonometry". 

[i.l5] Recommendation ITU-T P. 57: "Artificial ears". 

[i.l6] ETSI TS 126 131: "Universal Mobile Telecommunications System (UMTS); LTE; Terminal 

acoustic characteristics for telephony; Requirements (3GPP TS 26.131 version 10.2.0 
Release 10)". 

[i.l7] Recommendation ITU-T P. 800: "Methods for subjective determination of transmission quality". 

[i.l8] ETSI TS 126 132: "Universal Mobile Telecommunications System (UMTS); LTE; Speech and 

video telephony terminal acoustic test specification (3GPP TS 26.132)". 

[i.l9] ETSI ES 202 396-1: "Speech and multimedia Transmission Quality (STQ); Speech quality 

performance in the presence of background noise; Part 1: Background noise simulation technique 
and background noise database". 

[i.20] Recommendation ITU-T TD 477 (GEN/12): "Handbook of subjective test practical procedures" 

(temporary document) - Geneva, 18-27 January 201 1. 
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[i.21] 
[i.22] 



AH-1 1-029, Better Reference System for the P.835 SIG Rating Scale, Q7/12 Rapporteur's 
meeting, 20-21 June 2011, Geneva, Switzerland. 

3GPP, Tdoc S4(12)0621, Ext-ATS Permanent document (EATS-3): "Common subjective testing 
framework for validation of P.835 test predictors". 



3 Abbreviations 

For the purposes of the present document, the following abbreviations apply: 



AMR 


Adaptive MultiRate 


AMR-WE 


> Adaptive Multi-Rate Wideband Speech Codec 


BAK 


Background Noise Component 


dBSPL 


Sound Pressure Level re 20 |iPa in dB 


DRP 


Drum Reference Point 


DTX 


Discontinous Transmission 


G-MOS 


Global MOS 


NOTE: 


MOS related to the overall sample. 


HHHF 


Hand-Held Hands -Free 


IRS 


Intermediate Reference System 


ITU 


International Telecommunication Union 


ITU-T 


Telecommunication Standardization Sector of ITU 


MOS 


Mean Opinion Score 


MRP 


Mouth Reference Point 


MSIN 


Mobile Station Input Filter 


NB 


NarrowBand 


N-MOS 


Noise MOS 


NOTE: 


MOS related to the noise transmission only. 


NS 


Noise Suppression 


OVRL 


Overall (speech + noise) Component 


RCV 


ReCeiVe 


RMSE 


Root Mean Square Error 


RMSE* 


epsilon insensitive Root Mean Square Error 


SIG 


SlGnal component 


S-MOS 


Speech MOS 


NOTE: 


MOS related to the speech signal only. 


SND 


Sending Direction 


SNR 


Signal to Noise Ratio 


SPL 


Sound Pressure Level 


WB 


WideBand 


WCDMA 


Wideband Code Division Multiple Access 



Introduction 



The present document describes the modifications of the EG 202 396-3 [i.4] model which were necessary to adapt to 
the training databases provided by the 3GPP contributors listed in annex A. The core model itself retains mainly 
unmodified except the points given in the clauses below. Modifications affect the narrow- and wideband mode in 
different ways. 

The adapted objective method described in the present document is intended to be used for all types of modern mobile 
terminals using different bitrates of AMR [i.5] and AMR-WB [i.7] coding. 
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5 Underlying speech databases and preparations 

The base for each mode of the objective model (wideband/narrowband) as described in EG 202 396-3 [i.4] are listening 
tests conducted according to Recommendation ITU-T P. 835 [i.6]. From the beginning of the development, these 
listening test databases were designed to be a training set for predicting Recommendation ITU-T P. 835 [i.6] scores. 
They included a huge amount of conditions (> 170) and a wide range of speech and noise quality. Besides real terminals 
also terminal simulations and transmission impairments were included. However, the data and processing included were 
based on technologies actual at the time when the standard and its updates were created. 

The underlying databases for the retraining as described in the present document were created using real state-of-the-art 
mobile devices and thus the quality ranges yielded may not be normally distributed over all MOS scales. The context 
between the databases can also differ (e.g. pure handset recordings vs. mixed handset/hands-free databases). 
Furthermore new reference conditions extensively discussed in different standards groups and described in [i.l] were 
included in the tests. 

Table 1 : Set of reference conditions 



File 


SIG. 


SNR 


Noise Type 


iOI 


Source (filtered) 


No Noise 


- 


i02 


Source (filtered) 


OdB 


Fullsize Carl ISOKmh binaural 


i03 


Source (filtered) 


12dB 


Fullsize Carl ISOKmh binaural 


i04 


Source (filtered) 


24 dB 


Fullsize Carl ISOKmh binaural 


i05 


Source (filtered) 


36 dB 


Fullsize Carl ISOKmh binaural 


i06 


NS Level 1 


No Noise 


- 


i07 


NS Level 2 


No Noise 


- 


i08 


NS Level 3 


No Noise 


- 


i09 


NS Level 4 


No Noise 


- 


i10 


NS Level 3 


24 dB 


Fullsize Carl ISOKmh binaural 


ill 


NS Level 2 


12dB 


Fullsize Carl ISOKmh binaural 


i12 


NS Level 1 


[OdB] 


Fullsize Carl ISOKmh binaural 



Each training database was provided together with 12 reference conditions, mainly created according to the annex of 
[i.l], table 1 shows one possible arrangement. Although it was observed that not all reference sets included exactly the 
same speech material, used background noise, SNR ranges and speech distortion configuration, this data indicates 
which range of speech and noise degradations can be expected in the databases. 

For transforming the different databases (to achieve at least approximately on a common base for the retraining of the 
model), thus the 12 x 3 values of the reference conditions (averaged over all samples) were used to linearly transform 
the subjective MOS data. In a first step, the reference conditions of all databases included in the retraining process were 
weighted together to an average reference condition set. The weight per database depends on the number of samples it 
provides for the training. 

For each database, a mapping between the reference conditions and the average reference condition set is calculated. To 
catch also inter-relations between speech, noise and global ratings, a matrix transformation instead a per-scale 
regression was chosen. To compensate biases, a constant column was added to the reference set. Then a transformation 
Tj is calculated for each database j with reference set Rj which minimizes the distance to the average reference set A: 



'1 5joi A^ioi Gii2^ 

>1 -^112 ^il2 Gjl Z' 
Rj(Ref. set i) 



^lOl ^lOl ^112 



XTj = 



ll2 ^ll2 ^ll2, 
A (Avg. ref. set) 



(1) 



The transformation matrix Tj (size 4x3) can easily be determined to: 

Tj = {Rj^ X RjY^ X Rj^ X A 



(2) 
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If the three scales (S-MOS/N-MOS/G-MOS) are independent from each other for any database, the matrix 
transformation T: equals a linear per-scale transformation. Before the retraining of the model, the transformation is 
applied to the whole test data on a per-sample base: 



1 5"i iVi Gi 
X S^ iVjv Gf^, 

Si (scores of samples of database j) 



xTj = 




Sj (transformed scores of 

samples of database j) 

(3) 



6 Modifications to the model described in 

EG 202 396-3 

6.1 Prefiltering in Narrowband Mode (NB) 

In the narrowband mode described in EG 202 396-3 [i.4], the listening test audio files included a far-end handset 
simulation, realized with an IRS RCV filter. In the requirements described in [i.4], neither for narrow- nor for wideband 
such a listening filter was described or used in the databases. 

The narrowband mode internally filters the unprocessed and clean reference with IRS SND and IRS RCV to simulate a 
transmission over high-quality listening devices and network. The principle of IRS seems to be outdated, modern state- 
of-the-art mobiles do not have this frequency characteristic. Even more when using these newly created NB databases, 
where the used devices have almost flat frequency responses in sending direction. 

Thus the filtering with IRS SND and RCV of the two reference signals was replaced by filtering with the MSIN [i. 1 1 ] 
filter, which is mainly a band pass. Also no listening filter was applied to the processed signals. 

6.2 Detection of the speech parts 

The detection of signal parts belonging to either speech or noise was updated. Now the clean speech signal is segmented 
into frames and classified according to Recommendation ITU-T G.160 [i.lO]. The signal parts classified as silence are 
assumed as background noise sections, all other frames are assumed as speech. 

6.3 Speech level adjustment in wideband 

The current EG 202 396-3 [i.4] implementation assumes 79 dB SPL / -15 dB Pa active speech level due to the 
underlying listening test based on the underlying subjective databases in the wideband model of EG 202 396-3 [i.4]. 

For the objective model as described in the present document the level adjustment of the recordings of the training 
databases was applied in such a way, that the active speech level over the full sequence test should be about 
73 dB SPL / -21 dB Pa (for the listening test) as described in [i.4]. 

6.4 Replacement of parameter regression for S-MOS 

The model described in EG 202 396-3 [i.4] calculates several parameters out of the psycho-acoustically motivated inner 
representation for the estimation of S- and N-MOS. The parameters are shown in tables 2 and 3. A detailed description 
of the calculation for the parameters can be found in [i.4]. 
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Table 2: Extracted parameters for N-MOS 



Pi Nbgn, p 

P2 kl(RABGN,p) 

P3 o^IRAbgmp) 



P4 M-(RAbgn, u) 
P5 a2(RABGN,u) 
Pe a2(ARABGN,P-u) 



Table 3: Extracted Parameters for S-MOS 



Pi ASNR 

P2 kl(RAsp,p) 
P3 ^(ARAsp,p-u) 



P4 |i(ARAsp,p-c) 
P5 a^(ARAsp,p-c) 
Pfi a^(ARAsp,p-u) 



The calculation of the objective S-MOS in clause 6.5.2 of [i.4] is performed with a linear quadratic regression of the 
parameters mentioned above. In addition, the regression coefficients are switched with regard to the N-MOS calculated 
before which models the expectation to speech [i.4] quality of the listener. 

The applied modification is the replacement of the linear quadratic regression with a feed forward neural network. In 
consequence, the switching of the regression coefficients depending on the N-MOS is removed. Only one network is 
trained with input (6 parameters of table 3) and output (S-MOS) data by a simple back-propagation algorithm [i.l2]. 



Sigmoid Functions 



Hidden Neuron Layer Output Neuron Layer 



ASN Rpra-.,-lInprnc.. 

l.i(RAsp,Proc.) [cPa] 
fjiARAf,:^j.T.ac-cican ) [cPa] 

ff^(ARAsp^Froc-Clenn) [(^Pa] 

fi [ARAsp ^Proc- Un-irt-oc ) [cPaJ 

cr'^ [ARAsp^Proc-UHproc) [cPa] 



'. ^ 

^> -1... 41 






-~^H, 



■^ - , 




, I W; 



o, 



-•-N 



I3 C'T 
»-< ■ ■ ■ ^ ^ 



A. 



s \ 




J Wi 



O, 



->^ 




II Wj ■ Ojl *- S-MOS 



^^N. 




IW; 



O4 ' 
>-' 



- - *- 



Figure 1 : Structure of neural network for S-MOS 






The setup of the neural network is shown in figure 1 . It consists of 5 units in one hidden layer; each unit N; includes a 
connection from each transformed input parameter Ij. The output O: of each unit is calculated as the weighted sum of 
each input Ij using the weights w-. The outputs O: are then weighted by W: and summed up to the output S-MOS. Both, 
Wjj and Wj are the result of the training of the network. 
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The parameters according to table 3 are composed to a vector P including a bias as the first element: 

P = (l Pi P2 P, P, Ps P,) 

(4) 
The output calculation of the neural network shown in figure 1 can be described as concatenated matrix operations: 

rP - Mi, 



_ / /P - Min\ \ 

^^'^^objective.raw ~ Jsigmoid [fsigmoid I ^ I ^ "j ^ " 



(5) 

First the parameter vector P is normalized to mean 0,0 and standard deviation 1,0. This is done by subtracting the 
average of all training data for each parameter from each item of the input parameter vector. The averages for each 
parameter Pj can be described as a vector, which is different for narrow- and wideband mode: 

Mi„,wB = (0,0 12,7309 4,2076 -1,2456 0,8834 12,2522 7,0541) 

Min,NB = (0,0 13,7519 2,0884 -0,3124 0,2511 6,7091 5,2951) 

(6) 
NOTE 1 : The first element is set to zero to be compatible with the bias element in P. 

A similar approach can be made for the standard deviation for each parameter Pj, also separated for wide- and 
narrowband: 

Sin,WB = (1>0 11,8503 1,2824 1,1981 0,9572 6,7848 4,8380) 

Sin.NB = (1,0 11,4341 0,4047 0,3877 0,3309 3,1189 2,5976) 

(7) 
NOTE 2: The first element is set to one to be compatible with the bias element in P. 

After normalizing the input data, the sigmoid function ^siemoiJ^^^ ^^ applied to the each normalized parameter Pj. This 
ensures that each input of each neuron of the hidden layer is soft-limited to the range +1,0 and guarantees that 
parameters out of the training range cannot produce an overflow which results in eventually unreasonable scores. For 
the current model, the hyperbolic tangent was chosen to a sigmoid function: 



fsigmoidi^) — tanh (x) 

Thus the input of the hidden neuron layers can also be given as a transformed parameter vector P: 

P-Mi 



(8) 



/P — MjnX _ 

fsigmoid I ^^ j = (1 Pi ^2 ^3 ^4 ^5 ^e) 



(9) 



NOTE 3: The sigmoid function is not applied to the bias component. 



The output of the hidden layer is calculated with a matrix multiplication of P and H. H describes all weights from each 
input parameter to each neuron in the hidden layer. These weights are the results of the training with the back- 
propagation algorithm. In consequence, H is different for each bandwidth mode: 
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HwB — 



H 



/ -0,4336 
0,1141 


-0,9873 


0,0091 


-0,0845 


0,0203 \ 
-1,8189 


-0,0004 


-0,7133 


-0,2798 


1,0265 


0,5001 


0,5120 


0,0537 


0,1265 


-0,8627 


-1,7518 


-0,0374 


-0,2908 


0,3064 


2,1381 


0,4190 


1,0715 


-1,6716 


0,4973 


-1,3933 
\-0,3793 


0,5972 


0,0852 


0,1977 


0,2222 
-2,9630/ 


-1,7785 


-0,5306 


-1,7538 


/ -0,3608 
0,7348 


-0,3805 


0,5359 


-1,1131 


-0,1322\ 
0,5452 


-4,4639 


-1,2552 


0,3338 


0,9117 


2,7177 


0,8876 


0,1712 


-2,1279 


-0,2383 


1,7228 


-0,0354 


-1,0284 


1,0483 


1,4511 


2,1467 


1,0010 


0,7356 


0,1154 


-0,5573 
\-3,2194 


-0,6137 


-0,2648 


1,6202 


0,5966 
0,1663 / 


-7,9575 


-0,7736 


-0,8676 



(10) 



NB — 



(11) 

The outputs of the hidden layer are then again soft-hmited with the same sigmoid function to assure a vaHd range (±1,0) 
for the output neuron layer. The five transformed output values of the hidden layer are then given to the output layer. 
Here the output of the neural network is calculated with another matrix multiplication with the matrix O, which weights 
the outputs of the hidden layers to an output score SMOS gj^jg^^^^g ^^^. This output layer matrix O is also given for wide 
and narrowband mode independently: 

Owe = (0,1777 -0,2835 -0,3147 0,1837 -0,3237) 

Onb = (0,3832 -0,5250 -0,1878 -0,2674 -0,1548) 

(12) 

Another part of the back-propagation algorithm is also to normalize the output data to mean 0,0 and standard deviation 
1,0. To revise this step and transform the output of the neural network back to the MOS scale, the objective S-MOS is 
calculated from the raw score: 



SMOSobjective = max (1,0, min (So^ • (.SMOSabJective.raw + Mom) -5,0)) 
The objective S-MOS is calculated with Mou, = (3,0), Som = (2,0) and a hard Hmiter [1,0; 5,0]. 



(13) 



6.5 Retraining of parameter regression for N-IVIOS and G-IVIOS 

The objective N-MOS is the result of a linear, quadratic regression algorithm applied to the six parameters of table 2 
according to equation (14): 

2 6 



NMOS = c,+Y,Y.^,-P/ (1) 



>=i/=i 



(14) 



The overall or global quality G-MOS is calculated by using the previously calculated N-MOS and S-MOS as input 
parameters for a linear quadratic regression according to equation (15): 



CMOS = c^+Y, ^sj ■ SMOS ' + Y^ c^j ■ NMOS ' (1) 



(15) 



£75/ 



14 



ETSI TS 103 106 VI .2.1 (2013-03) 



The calculation steps for N-MOS and G-MOS are not modified, only the coefficients for the linear regressions 
according to equations (14) and (15) are adapted to the new training material. The new coefficients are given in tables 4 
to 7: 

Table 4: N-MOS coefficients for narrowband; Parameters Pj according to table 2 





Bias 


Pi 


P2 


P3 


P4 


P5 


P6 


Order j = 1 


2,2231 


-0,0395 


-0,0359 


0,2825 


0,0023 


-0,3959 


-2,6965 


Orderj = 2 


- 


- 


0,0021 


-0,0239 


-0,0003 


0,0542 


0,8684 



Table 5: N-MOS coefficients for wideband; Parameters Pj according to table 2 





Bias 


Pi 


P2 


P3 


P4 


P5 


Pe 


Order j = 1 


1 ,4279 


-0,0484 


0,0994 


0,2189 


-0,0732 


-0,3346 


-1,3108 


Order j = 2 


- 


- 


-0,0018 


-0,0079 


0,0011 


0,0891 


0,2566 



Table 6: G-MOS coefficients for narrowband 





Bias 


S-IMOS 


N-IMOS 


Order j = 1 


-0,4879 


0,2647 


0,8274 


Order j = 2 


- 


0,0696 


-0,0737 



Table 7: G-MOS coefficients for wideband 





Bias 


S-IMOS 


N-IVIOS 


Orderj = 1 


-0,2141 


0,2735 


0,4542 


Orderj = 2 


- 


0,0708 


-0,0065 



7 Comparison of objective and subjective results after 

the training process 

The comparison between the results of the subjective tests and the objective prediction of the conditions used in the 
training process are given in this clause. The metrics used in the statistical evaluation process are derived from 
Recommendation ITU-T P.1401 [i.9]. Besides the RMSE or RMSE* values, the different metrics and scatterplots are 
given in this clause. 

A summary of the databases and the conditions used for retraining is given in annex A. 



7.1 



Results in wideband mode 



For the wideband retraining procedure two databases were not included within the training for several reasons. Removal 
of these databases significantly increases the performance. Further analysis is required why these databases seem to be 
"incompatible" with the remaining training set. 

Overall, 7 databases with 387 conditions and 5 544 samples were used. 



£75/ 



15 



ETSI TS 103 106 VI .2.1 (2013-03) 



7.1 .1 Results for database "Audience - Test 3" 
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7.1 .2 Results for database "Audience - Test 3L" (excluded during 
retraining) 
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AudienceTest3L 
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7.1 .3 Results for database "Audience - Test 4" 
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7.1 .4 Results for database "Audience - Test 4L" 
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7.1.5 Results for database "Nokia - Test 1 " 
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7.1 .6 Results for database "Nokia - Test 2" (excluded during retraining) 
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7.1 .7 Results for database "Orange" 
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7.1 .8 Results for database "Qualcomm - Test 3" 
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7.1 .9 Results for database "Qualcomm - Test 4" 
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7.2 



Results in narrowband mode 



For the narrowband retraining procedure, no database was excluded. 
Overall, 6 databases with 288 conditions and 3 840 samples were used. 



ETSI 



21 



ETSI TS 103 106 VI .2.1 (2013-03) 



7.2.1 Results for database "Audience - Test 1 " 
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7.2.2 Results for database "Audience - Test 1 L" 
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7.2.3 Results for database "Audience - Test 2" 
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7.2.4 Results for database "Audience - Test 2L" 
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7.2.5 Results for database "Qualcomm- Test 1 " 
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7.2.6 Results for database "Qualcomm- Test 2" 
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8 



Validation results 



For the validation of the model different databases were provided. The databases included different types of conditions 
and different terminals and simulations. The details of the validation databases are described separately for each set of 
databases provided by the validation labs. 
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8.1 



Audience validation data 



8.1.1 Description of tests 



Four tests were conducted, two narrowband (5 and 6) and two wideband (7 and 8). In each test, the noise types listed in 
[i.l] were used, but the noise levels were increased by 6 dB as in five of the training databases. Six different devices, 
new to this sequence of validation tests, were used, again a mix of commercial and simulated handsets. All devices were 
tested in both handset and handheld speakerphone use cases, counterbalanced between the pair of tests at a given 
bandwidth. 

Devices 

In each experiment, six devices were evaluated, the maximum number allowed in the EATS-3 [i.l] test plan. In each 
experiment at one bandwidth, half of the devices were tested in handset mode and half tested in handheld speakerphone 
mode, in order to provide a consistent and wide range of listening conditions, so that all six devices were tested in both 
handset and handheld speakerphone modes across the two tests at each bandwidth. The devices included a mix of real 
and simulated devices with both 1- and 2-microphone noise suppression systems. 

The reference conditions and noise types are as defined in table 1 of [i.l] (see table 7a below). 

Table 7a 



Reference Conditions 


File 


SIGNAL 


SNR 


Noise Type 




iOI 


Source (filtered) 


No Noise 


- 




i02 


Source (filtered) 


OdB 


Fullsize Carl 130Kmh binaural 




i03 


Source (filtered) 


12 dB 


Fullsize Carl 130Kmh binaural 




i04 


Source (filtered) 


24 dB 


Fullsize Carl 130Kmh binaural 




i05 


Source (filtered) 


36 dB 


Fullsize Carl 130Kmh binaural 




i06 


NS Level 1 


No Noise 


- 




i07 


NS Level 2 


No Noise 


- 




i08 


NS Level 3 


No Noise 


- 




i09 


NS Level 4 


No Noise 


- 




i10 


NS Level 3 


24 dB 


Fullsize Carl 130Kmh binaural 




ill 


NS Level 2 


12dB 


Fullsize Carl 130Kmh binaural 




i12 


NS Level 1 


[OdB] 


Fullsize Carl 130Kmh binaural 




Test Conditions 


File 


Speech level 

@MRP 

Handset/handsfree 


Noise level 

@ HATS ear simulators with 

ID correction 


Noise Type 


Description of Noise 
from EG 202 396-1 [i.2] 


i13 


-1,7/+1,3dBPa 


L: 75,0 dB(A) / R: 73,0 dB(A) 


Pub Noise binaural V2 


Recording in a pub 


i14 


-1,7/+1,3dBPa 


L: 74,9 dB(A) / R: 73,9 dB(A) 


Outside_Traffic_Road_binaural 


Recording at 
pavement 


i15 


-1,7/+1,3dBPa 


L:69,1 dB(A) / R: 69,6 dB(A) 


Outside_Traffic_Crossroads_binaural 


Recording at 
pavement 


i16 


-1,7/+1,3dBPa 


L: 68.2 dB(A) / R:69,8dB(A) 


Train_Station_binaural 


Recording at departure 
platform 


i17 


-1,7/+1,3dBPa 


L:69,1 dB(A)/R:68,1 dB(A) 


Fullsize_Car1_1 30Kmh_binaural 


Recording in 
passenger cabin 


i18 


-1,7/+1,3dBPa 


L: 68,4 dB(A) / R: 67,3 dB(A) 


Cafeteria_Noise_binaural 


Recording at sales 
counter 


i19 


-1,7/+1,3dBPa 


L: 63,4 dB(A)/R: 61,9 dB(A) 


Mensa_binaural 


Recording in a 
cafeteria 


120 


-1,7/+1,3dBPa 


L: 56,6 dB(A) / R: 57,8 dB(A) 


Work_Noise_Office_Callcenter_binaural 


Recording in a 
business office 



However, as noted above for these tests, the noise levels were increased by 6 dB as was done in five of the training 
databases. 
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8.1 .2 Description of validation results 

For each test, three scatter plots are shown, plotting the results of the predictions versus the subjective data. In each plot, 
three sets of data are shown, one for no mapping, one for a first-order remapping, and one for a third-order remapping. 
Tables of correlation, RMSE, and RMSE* [i.9] follow each set of scatter plots. The P' and 3"^ order remappings were 
derived for each experiment from the 48 test conditions, according to the procedure defined in [i.9]. The intention 
behind showing scatter plots for the three mapping cases is to demonstrate visually that there is only a small impact of 
the remapping procedure for these data. 



8.1.2.1 



Experiment 5: Narrowband 
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Figure 2: Experiment 5 S-IUIOS scatter plot 
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Figure 3: Experiment 5 N-IVIOS scatter plot 
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Figure 4: Experiment 5 G-IUIOS scatter plot 
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Table 8: Correlation, RMSE, and RMSE* for experiment 5 



Condition 


S-IMOS 


N-MOS 


G-MOS 


Correlation 


0,96 


0,97 


0,94 


RMSE, no mapping 


0,35 


0,36 


0,33 


RIVISE, 1^^ order mapping 


0,25 


0,20 


0,27 


RIMSE, 3'"'^ order mapping 


0,22 


0,18 


0,28 


RIVISE*, no mapping 


0,24 


0,25 


0,23 


RMSE*, l^t order mapping 


0,14 


0,12 


0,20 


RIMSE*, 3'"'' order mapping 


0,12 


0,10 


0,20 



8.1.2.2 



Experiment 6: Narrowband 
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Figure 5: Experiment 6 G-MOS scatter plot 
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Figure 6: Experiment 6 N-IUIOS scatter plot 
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Figure 7: Experiment 6 G-IUIOS scatter plot 
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Table 9: Correlation, RMSE, and RMSE* for Experiment 6 



Condition 


S-IMOS 


N-IVIOS 


G-IVIOS 


Correlation 


0,93 


0,97 


0,93 


RMSE, no mapping 


0,38 


0,28 


0,35 


RIVISE, 1st order mapping 


0,32 


0,22 


0,28 


RIU1SE, 3rd order mapping 


0,32 


0,20 


0,28 


RIVISE*, no mapping 


0,28 


0,18 


0,25 


RMSE*, 1st order mapping 


0,22 


0,14 


0,19 


RIUISE*, 3rd order mapping 


0,22 


0,12 


0,20 



8.1.2.3 



Experiment 7: Wideband 



o 



S 

o 







.-:' 




• 


• • 

• • 


•^ • • • 


1 • 

ll 


• 

» 

: 

• 

• 

f 
• f 

-^ 


f 
/ 

f 

• 

• 


-1 

m 





• no mFip 

• liioid 

• 3rdord 
■ - ref 



Subjective SM OS 



Figure 8: Experiment 7 S-MOS scatter plot 
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Figure 9: Experiment 7 N-IUIOS scatter plot 
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Figure 10: Experiment 7 G-IUIOS scatter plot 
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Table 10: Correlation, RMSE, and RMSE* for Experiment 7 



Condition 


S-IMOS 


N-MOS 


G-MOS 


Correlation 


0,90 


0,96 


0,89 


RMSE, no mapping 


0,46 


0,29 


0,39 


RIVISE, 1^^ order mapping 


0,37 


0,24 


0,35 


RIMSE, 3'''^ order mapping 


0,36 


0,22 


0,36 


RIVISE*, no mapping 


0,36 


0,20 


0,32 


RIVISE*, l^t order mapping 


0,26 


0,13 


0,26 


RIMSE*, 3'"'' order mapping 


0,25 


0,12 


0,27 



8.1.2.4 



Experiment 8: Wideband 
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Figure 1 1 : Experiment 8 S-MOS scatter plot 
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Figure 12: Experiment 8 N-IUIOS scatter plot 
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Figure 13: Experiment 8 G-IVIOS scatter plot 
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Table 11 : Correlation, RMSE, and RMSE* for Experiment 8 



Condition 


S-IMOS 


N-MOS 


G-MOS 


Correlation 


0,87 


0,97 


0,90 


RMSE, no mapping 


0,45 


0,24 


0,31 


RIVISE, 1^^ order mapping 


0,38 


0,23 


0,31 


RIMSE, 3'''^ order mapping 


0,37 


0,24 


0,30 


RIVISE*, no mapping 


0,32 


0,14 


0,20 


RIVISE*, 1^' order mapping 


0,26 


0,14 


0,20 


RIMSE*, 3'"'' order mapping 


0,26 


0,14 


0,20 



8.2 Orange validation data 
8.2.1 Description of tests 

The Orange validation database includes six wideband mobile devices, and three noises from EG 202-396-1 [i.4] at 
nominal level are used (see table 12). As for speech samples, four talkers are used: two males and two females, with 
two sentences for each talker. The resulting tests conditions are summarized in table 13. Except for f3, all talkers come 
from Recommendation ITU-T P.501 [i.l3]. 

Table 12: Noise names and descriptions for Orange validation database 



Noise type 


Description 


EG 202 396-1 [i.2] filename 


Crossroad 


Recording at pavement 


Outside Traffic Crossroads binaural 


IVIensa 


Recording in a cafeteria 


Mensa binaural 


Pub 


Recording in a Pub 


Pub Noise binaural V2 



Table 13: Definition of tests conditions parameters for Orange WB validation test 



Test conditions 


Number 


Designation 


Noises 


3 


N1, N2, N3 


SNR 


1 


Nominal level 


Devices 


6 


D1, ..., D6 


Talkers 


4 


m1,m2, f2, f3 


Sentences per talker 


2 


s1,s2 



All test conditions were processed with the 4 talkers and 2 sentences. Level adjustment was performed as described in 
EATS-3. 

Reference conditions which incorporate a spectral subtraction based distortion were included in the test and are listed in 
table 14. These reference conditions are exactly the same as the one provided in EATS-3, table 2 of [i.l]. 
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Table 14: Reference set conditions for wideband testing 



Reference Conditions 


File 


SIG. 


SNR 


Noise Type 


i01 


Source (filtered) 


No Noise 


- 


i02 


Source (filtered) 


10dB 


Outside Traffic Crossroads binaural 


i03 


Source (filtered) 


20 dB 


Outside Traffic Crossroads binaural 


i04 


Source (filtered) 


30 dB 


Outside Traffic Crossroads binaural 


i05 


Source (filtered) 


40 dB 


Outside Traffic Crossroads binaural 


i06 


NS Level 1 , 2^^^ set of parameters 


No Noise 


- 


i07 


NS Level 2, 2"^ set of parameters 


No Noise 


- 


i08 


NS Level 3, 2'^'^ set of parameters 


No Noise 


- 


i09 


NS Level 4, 2"^ set of parameters 


No Noise 


- 


i10 


NS Level 3, 2"^ set of parameters 


30 dB 


Outside Traffic Crossroads binaural 


i11 


NS Level 2, 2"^ set of parameters 


20 dB 


Outside Traffic Crossroads binaural 


i12 


NS Level 1 , 2"^ set of parameters 


10dB 


Outside_Traffic_Crossroads_binaural 



8.2.2 Description of validation results 

Scatter plots on a per condition basis are provided in figures 15 to 17: they show the distribution over the quality range 
for the three dimensions (Speech, Noise, Overall quality). 

The RMSE and RMSE* performance parameters specified in [i.9] were computed. Results before mapping and after 
monotonic 3'"'^ order mapping are presented in tables 15 and 16 respectively. The Pearson correlation is also reported in 
table 17. These results are meeting the performance requirements specified for RMSE and RMSE* on the 3'^'' order 
remapping, as given in [i.9]. 

Table 15: Statistical analysis results before mapping 





S-MOS 


N-MOS 


G-MOS 


RMSE 


0,68 


0,29 


0,62 


RMSE* 


0,58 


0,23 


0,53 



Table 16: Statistical analysis results after monotonic S***^ order mapping 





S-MOS 


N-MOS 


G-MOS 


RMSE 


0,38 


0,23 


0,29 


RMSE* 


0,30 


0,16 


0,21 



Table 17: Pearson correlation (after monotonic 3*''^ order mapping) 





S-MOS 


N-MOS 


G-MOS 


before mapping 


0,90 


0,97 


0,90 


after monotonic 3'''^ 
order mapping 


0,91 


0,98 


0,93 
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Figure 14: S-MOS scatter plot 
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Figure 15: N-IUIOS scatter plot 
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Figure 16: G-MOS scatter plot 



8.3 Qualcomm validation data 
8.3.1 Description of tests 

Two narrowband experiments following the EATS-3 subjective test plan [i.l] were conducted. The test set-up, 
background noise reproduction calibration and levels, noise types and convergence sequencing are according to the 
EATS-3 subjective test plan [i.l], except where noted. The reference conditions are according to [i.l], table 1. 

In the first validation experiment (Exp 6), 2 devices were tested with 7 noise types and a clean condition (no noise 
added). The devices were tested in the following modes: 

• Handset with AMR 12,2 kbps 

• Handset with AMR 5,9 kbps 

• Handheld Hands-free with AMR 5,9 kbps 

resulting in a total of 48 test conditions. The inclusion of AMR 5,9 kbps was used in order to increase the range of 
degradations for the validation tests. Commercial devices in a call with a CMU200 network simulator were used. 

In the second validation experiment (Exp 7), 1 device was tested with 7 noise types and a clean condition (no noise 
added). The device was tested in the following modes: 

Handset with AMR 12,2 kbps 

Handset with AMR 5,9 kbps 

Handheld Hands-free with AMR 5,9 kbps 

Handset with AMR 12,2 kbps (Noise levels increased by 6 dB) 

Handset with AMR 5,9 kbps (Noise levels increased by 6 dB) 

Handheld Hands-free with AMR 5,9 kbps (Noise levels increased by 6 dB) 

resulting in a total of 48 test conditions. A commercial device in a call with the CMU200 network simulator was used. 
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The same reference set (exact same signals) was used in the narrowband experiments reported in previous contributions 
in order to keep consistency and facihtate any necessary mapping or normaHzation of the data. 

Tables 18 and 19 detail the conditions for both experiments. 

Table 18: Summary of experimental conditions for EXP 6 (NB) 



Experiment 


6 


Number of devices 


2 (HS AMR 12.2; HS AMR 5.9; HHHF AMR 5.9) 


Number of noise conditions per device 


8 noise conditions 


Number of reference conditions 


12 


Number of test conditions 


48 


Number of talkers 


4 


Number of samples per talker 


4 


Number of votes per condition 


128 


Method of presentation 


Diotic 


Presentation level (for -26 dBov) 


73dBSPL 


Headphones 


HD280 PRO 


Reference set 


According to table 1 and batch processing script in section 8.3 of [i.1]. 


Noise conditions 


Pub Noise binaural V2 


Outside Traffic Road binaural 


Outside Traffic Crossroads binaural 


Clean (no noise) 


Fullsize Carl 130Kmh binaural 


Cafeteria Noise binaural 


Mensa binaural 


Work Noise Office Callcenter binaural 



Table 19: Summary of experimental conditions for EXP 7 (NB) 



Experiment 


7 


Number of devices 


1 (HS AMR12.2; HS AMR5.9, HHHF AMR12.2, HHHF AMR5.9) 


Number of noise conditions per device 


16 noise conditions 


Number of reference conditions 


12 


Number of test conditions 


48 


Number of talkers 


4 


Number of samples per talker 


4 


Number of votes per condition 


128 


Method of presentation 


Diotic 


Presentation level (for -26 dBov) 


73dBSPL 


Headphones 


HD280 PRO 


Reference set 


According to table 1 and batch processing script in section 8.3 of [i.1]. 


Noise conditions 


Pub Noise binaural V2 (nominal and -i-6 dB) 


Outside Traffic Road binaural (nominal and -i-B dB) 


Outside Traffic Crossroads binaural (nominal and -i-B dB) 


Clean (no noise, two different recordings) 


Fullsize Carl 130Kmh binaural (nominal and -h6 dB) 


Cafeteria Noise binaural (nominal and -1-6 dB) 


Mensa binaural (nominal and -1-6 dB) 


Work_Noise_Office_Callcenter_binaural (nominal and -1-6 dB) 



The results for Experiments 6 and 7 are summarized in figures 17 and 18. The results for S-MOS (SIG), N-MOS (BAK) 
and G-MOS (OVRL) of 60 conditions (being 48 test and 12 reference conditions) are reported for each experiment. 
Results are sorted by OVRL. 
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It can be seen that both experiments exercised the entire range of degradations for the SIG, BAK and OVRL scales. 
About 67 % of the scores for OVRL are > 3,0 in both tests. This is in contrast with previous experiments conducted by 
the source where 3,0 represented the median of the scores for OVRL. This effect is observed despite an attempt to 
increase the range of degradations by including raised noise levels and AMR 5,9 kbps speech coding. 



Results for 60 conditions (48 test and 12 reference) in Experiment 6 
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Figure 17: Results of Experiment 6 



Results for GO conditions (4B test and 12 reference) in Experiment 7 
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Figure 18: Results of Experiment 7 



8.3.2 Description of validation results 



Each individual sample used in Experiments 6 and 7 was processed by HEAD Acoustics GmbH using the re-trained 
P. 835 objective predictor model. An average of the objective scores per condition (average of the scores of 16 samples), 
as well as the 95 % confidence interval was computed and plotted against the results of the subjective test. Scatter plots 
for N-MOS, S-MOS and G-MOS ai-e shown in figures 19 to 24. 
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Scatter plot of P.835 BAK scores and objective 
prediction for EXP 6 {unmapped) 
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Figure 19: Experiment 6 N-IUIOS scatter plot 
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Scatter plot of P.835 SIG scores and objective 
prediction for EXP 6 {unmapped) 
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Figure 20: Experiment 6 S-IUIOS scatter plot 
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Scatter plot of P.835 OVRL scores and objective 
prediction for EXP 6 {unmapped) 
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Figure 21 : Experiment 6 G-IUIOS scatter plot 



Scatter plot of P.835 BAK scores and 
objective prediction for EXP 7 (unmapped) 
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Figure 22: Experiment 7 N-MOS scatter plot 
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Scatter plot of R835 SIG scores and 
objective prediction for EXP 7 (unmapped) 
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Figure 23: Experiment 7 S-IUIOS scatter plot 



Scatter plot of P.835 OVRL scores and 
objective prediction for EXP 7 (unmapped) 
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Figure 24: Experiment 7 G-IUIGS scatter plot 

The Pearson correlation coefficient, RMSE and RMSE* performance parameters specified in [i.9] were computed for 
both vahdation databases and reported in tables 20 and 21 along with results before and after P*^ and 3'^'^ order mapping 



ETSI 



43 



ETSI TS 103 106 VI .2.1 (2013-03) 



Table 20: Performance of the objective predictor on NB validation database from EXP6 





Condition 


S-MOS 


N-MOS 


G-IMOS 




Correlation 


0,96 


0,95 


0,95 


RMSE: 


no Mapping 


0,37 


0,32 


0,32 




IstOrd. IVlap. 


0,26 


0,30 


0,28 




3rd Ord. IVlap 


0,19 


0,30 


0,28 


RMSE": 


no IVIapping 


0,28 


0,20 


0,22 




IstOrd. IVlap. 


0,17 


0,18 


0,18 




3rd Ord. Map 


0,09 


0,17 


0,18 



Table 21 : Performance of the objective predictor on NB validation database from EXP7 





Condition 


S-MOS 


N-MOS 


G-MOS 




Correlation 


0,87 


0,99 


0,97 


RMSE: 


no Mapping 


0,45 


0,13 


0,36 




IstOrd. Map. 


0,36 


0,13 


0,19 




3rd Ord. Map 


0,33 


0,12 


0,16 


RMSE*: 


no Mapping 


0,33 


0,04 


0,23 




IstOrd. Map. 


0,28 


0,04 


0,12 




3rd Ord. Map 


0,25 


0,04 


0,07 



Application of the retrained model 



In order to avoid ambiguities in the results the objective model should be applied in the way it was applied during the 
training process which also reflects the listening test: 

1) The speech samples used in conjunction with the model should be the ones used in the subjective tests: 
16 sentences of male and female speakers, American English. 

2) The results should be calculated on a per sentence basis and averaged over all 16 samples. 

3) The background noises to be used in conjunction with the model shall be taken from EG 202 396-1 [i.2]. 

4) The setup is according to EG 202 396-1 [i.2]. 
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Annex A (normative): 

Summary of Retraining Databases 





References 




Database 


Lab 


Tests 


BW 


« Noise Types 


NSLevel 


Ref SNR |dB] 


# Reference 
Conditions 


STest 
Conditions 


» 
Devices 


Use case 


Ustening 
Instnjment 


Listening 
Mode 


Piesentation 
Level IdBSPL] 


<t talkers 


ff samples 
pertalker 


(tof 
listeners 


# votes 

per 
sample 


((votes 

per 

condition 


Signals available 


Contribution 


1 


Audience 


1 


NB 


8(replace Crossroads with clean speech) 


Table 1 


0, 12, 24, 36 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120322 


2 


Audience 


2 


NB 


S(replace Crossroads with clean speech) 


Table 1 


0, 12, 24, 36 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120322 


3 


Audience 


3 


WB 


S(replace Crossroads with clean speech) 


Table 1 


10, 20, 30, 40 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120322 


4 


Audience 


4 


WB 


S(replace Crossroads with clean speech) 


Table 1 


10, 20, 30, 40 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120322 


5 


Qualcomm/Dynastat 


1 


NB 


6 (Pub, Road, Train, Car, Mensa, clean speech) 


Table 1 


0, 12, 24, 36 




48 


8 


HS 


HD25 


diotic 


73 


4(2M,2F) 


8 


32 


4 


128 


CMU OUT, PRI MIC IN, MRP 


S4- 120375 


e 


Qualcomm 


2 


NB 


8 (replace Train with clean speech) 


Table 1 


0, 12, 24, 36 




48 


3 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M,2F) 


4 


32 


8 


128 


CMU OUT, PRI MIC IN, MRP 


S4- 120375 


7 


Qualcomm 


3 


NB 


8 (replace Train with clean speech) 


Table 1 


0, 12, 24, 36 




48 


6 


HS 


HD280PRO 


diotic 


73 


4(2M,2F) 


4 


32 


8 


128 


CMU OUT, PRI MIC IN, MRP 


S4- 120375 


8 


Orange SA 


1 


WB 


5 (Car, Road, Train, Cafeteria, Office) 


Table 2 


10, 20, 30, 40 




90 


6 


HS 


HD25 


monaura 


79 


6(3M, 3F) 


2 


24 


24 


288 


CMU OUT, PRI MIC IN 


SA-120348 


9 


Qualcomm 


4 


WB 


8 (replace Train with clean speech) 


Table 2 


10, 20, 30, 40 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


4 


32 


8 


128 


CMU OUT, PRI MIC IN, MRP 


S4- 120467 


10 


Qualcomm 


5 


WB 


8 (replace Train with clean speech) 


Table 1 


10, 20, 30, 40 




48 


6 


HS 


HD280PRO 


diotic 


73 


4(2M, 2F) 


4 


32 


8 


128 


CMU OUT, PRI MIC IN, MRP 


S4- 120619 


11 


Audience 


lA 


NB 


8(noise level +6dB) 


Table 1 


0, 12, 24, 36 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120655 


12 


Audience 


2A 


NB 


8(noise level +6dB) 


Table 1 


0, 12, 24, 36 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M,2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120655 


13 


Audience 


3A 


WB 


8(noise level +6dB) 


Table 1 


10, 20, 30, 40 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M,2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120655 


14 


Audience 


4A 


WB 


8(noise level +6dB) 


Table 1 


10, 20, 30, 40 




48 


6 


HS&HHHF 


HD280PRO 


diotic 


73 


4(2M, 2F) 


2 


32 


16 


128 


CMU OUT, PRI MIC IN 


S4- 120655 


15 


NOKIA Corp/Dynastat 


1 


WB 


8 


Table 1 


0, 12, 24, 36 


12 


48 


6 


HS 


HD25 


diotic 


73 


4(2M,2F) 


4 


32 


8 


128 


CMU OUT, PRI MIC IN 


S4- 120813 


16 


NOKIA Corp/Dynastat 


2 


WB 


8 


Table 1 


0, 12, 24, 36 


12 


48 


6 


HS 


HD25 


diotic 


73 


4(2M,2F) 


4 


32 


8 


128 


CMU OUT, PRI MIC IN 


S4- 120813 
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Annex B (normative): 

Test vectors for model verification 



The test vectors for verification of an objective model implementation are given in this annex. A model claiming to be 
compatible with the present document shall achieve all scores with an accuracy of +0,1 MOS. 



B. 1 Audience test vectors 



The validation results below are for signals used in the validation Experiments 6 (NB) and 8 (WB) as reported in 
clause 9. The reference conditions and noise types are as described in [i.l], but with levels of noise increased by 6 dB. 
The [six] devices have been tested in a mix of handset and handheld speakerphone use cases. Predictions from the 
model are presented at both sample and condition level. For each Experiment, 50 sample files and value sets are 
provided for validation of implementations of this model. 

The test vectors can be downloaded here: 

http://docbox.etsi.org/stq/Open/TS%20103%20106%20Wave%20files/Annex Bl 1 2 Audience%20Verification%20 
Data/ 

Table B.1 : Audience experiment 6 test vectors and objective scores to be 
achieved by an objective model implementation 











Per sample 


Noise 


Device 


talker 


sample 


SMOS 


NMOS 


CMOS 


cafeteria 


A 


m1 


s2 


3,87 


3,29 


3,50 


carl 


A 


m1 


s2 


3,51 


3,42 


3,27 


crossroads 


A 


f1 


s4 


3,54 


3,42 


3,29 


crossroads 


A 


f2 


s3 


3,18 


3,34 


3,00 


mensa 


A 


f1 


s5 


4,24 


3,25 


3,80 


mensa 


A 


m2 


s4 


3,62 


3,45 


3,36 


office 


A 


f2 


s6 


4,35 


3,59 


4,00 


pub 


A 


f1 


s6 


2,31 


2,84 


2,25 


pub 


A 


m1 


s6 


2,31 


3,07 


2,34 


trafficRoad 


A 


f1 


s7 


2,82 


2,85 


2,57 


cafeteria 


B 


f1 


si 


3,42 


3,21 


3,13 


cafeteria 


B 


f2 


s2 


3,22 


3,20 


2,98 


carl 


B 


f2 


S3 


3,33 


3,59 


3,19 


carl 


B 


m1 


s3 


3,45 


3,59 


3,27 


crossroads 


B 


m1 


S3 


3,30 


3,56 


3,15 


mensa 


B 


f1 


s5 


4,18 


3,33 


3,77 


office 


B 


f2 


s6 


4,33 


3,58 


3,98 


office 


B 


m2 


s6 


4,06 


3,57 


3,75 


pub 


B 


m1 


s6 


2,30 


2,45 


2,08 


pub 


B 


m2 


s7 


2,32 


2,69 


2,19 


trafficRoad 


B 


ml 


s8 


2,68 


2,31 


2,24 


trafficRoad 


B 


m2 


s8 


2,89 


2,26 


2,35 


train 


B 


m1 


si 


2,22 


3,07 


2,29 


train 


B 


m2 


s8 


2,62 


3,17 


2,57 


cafeteria 


D 


m2 


s2 


3,64 


2,87 


3,17 


carl 


D 


m2 


s3 


3,20 


4,07 


3,22 


crossroads 


D 


m2 


s4 


3,78 


3,88 


3,61 


mensa 


D 


m2 


s5 


3,94 


4,16 


3,80 


office 


D 


m2 


s6 


4,28 


3,75 


3,99 


pub 


D 


m2 


s7 


3,42 


3,90 


3,34 


trafficRoad 


D 


m2 


s8 


3,11 


2,71 


2,71 


train 


D 


m2 


s8 


3,54 


3,85 


3,42 


cafeteria 


E 


m2 


s2 


2,67 


1,93 


2,04 


car 


E 


m2 


S3 


1,85 


1,92 


1,56 


crossroads 


E 


m2 


s4 


2,98 


1,65 


2,08 


mensa 


E 


m2 


s5 


3,52 


2,23 


2,79 


office 


E 


m2 


s6 


3,48 


2,41 


2,84 
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Per sample 


Noise 


Device 


tall<er 


sample 


SMOS 


NMOS 


CMOS. 


pub 


E 


m2 


s7 


1,13 


1,27 


1,00 


trafficRoad 


E 


m2 


s8 


1,22 


1,23 


1,00 


train 


E 


m2 


s8 


2,82 


1,34 


1,79 


cafeteria 


F 


m2 


s2 


3,41 


2,63 


2,89 


car 


F 


m2 


s3 


3,72 


2,57 


3,10 


crossroads 


F 


m2 


s4 


3,86 


2,55 


3,20 


mensa 


F 


m2 


s5 


4,14 


2,95 


3,60 


office 


F 


m2 


s6 


4,18 


3,63 


3,87 


pub 


F 


m2 


s7 


2,95 


1,65 


2,06 


trafficRoad 


F 


m2 


s8 


2,71 


1,43 


1,78 


train 


F 


m2 


s8 


3,87 


1,96 


2,92 



Table B.2: Audience experiment 8 test vectors and objective scores to be 
achieved by an objective model implementation 











Per sample 


Noise 


Device 


talker 


sample 


SMOS 


NMOS 


CMOS 


cafeteria 


A 


m2 


si 


3,64 


3,50 


3,23 


carl 


A 


ml 


s2 


3,69 


4,17 


3,54 


crossroads 


A 


m2 


s3 


3,57 


3,65 


3,23 


mensa 


A 


f1 


s4 


3,60 


3,76 


3,30 


mensa 


A 


ml 


s5 


3,91 


4,21 


3,73 


office 


A 


f1 


s5 


4,07 


4,37 


3,94 


office 


A 


f2 


s6 


4,20 


4,31 


4,02 


pub 


A 


m2 


s6 


2,44 


4,02 


2,60 


trafficRoad 


A 


m2 


s7 


2,03 


3,40 


2,10 


train 


A 


ml 


si 


3,04 


4,06 


3,01 


cafeteria 


B 


f1 


s2 


3,37 


3,47 


3,01 


carl 


B 


m2 


s3 


3,80 


3,91 


3,53 


crossroads 


B 


m2 


s3 


3,52 


3,64 


3,19 


mensa 


B 


f1 


s5 


3,79 


3,73 


3,44 


mensa 


B 


m2 


s4 


3,82 


3,95 


3,56 


office 


B 


f2 


s5 


4,01 


4,04 


3,75 


office 


B 


ml 


s5 


4,17 


4,26 


3,98 


pub 


B 


f2 


s7 


2,91 


2,49 


2,27 


trafficRoad 


B 


m2 


s7 


2,02 


3,30 


2,06 


train 


B 


f1 


si 


2,99 


4,26 


3,05 


cafeteria 


D 


f2 


si 


3,63 


4,43 


3,60 


carl 


D 


f2 


s3 


3,52 


4,66 


3,60 


mensa 


D 


f2 


s4 


3,79 


3,85 


3,50 


office 


D 


f1 


s6 


4,21 


4,84 


4,24 


pub 


D 


f1 


s7 


2,85 


4,03 


2,87 


trafficRoad 


D 


f2 


s7 


2,40 


3,82 


2,49 


train 


D 


ml 


s8 


3,85 


4,53 


3,81 


cafeteria 


E 


m2 


s2 


3,20 


2,58 


2,52 


car 


E 


f2 


s3 


3,31 


2,48 


2,55 


mensa 


E 


f1 


s5 


4,06 


2,04 


2,96 


office 


E 


f1 


s6 


3,84 


2,65 


3,03 


pub 


E 


m2 


s6 


2,17 


1,56 


1,41 


trafficRoad 


E 


f2 


s8 


2,44 


1,96 


1,74 


train 


E 


m2 


si 


2,73 


1,92 


1,90 


cafeteria 


F 


f1 


si 


3,59 


2,13 


2,62 


car 


F 


f2 


s2 


3,66 


2,89 


2,99 


mensa 


F 


f2 


s5 


3,83 


3,01 


3,18 


office 


F 


ml 


s6 


4,09 


3,36 


3,54 


office 


F 


m2 


s5 


4,14 


3,37 


3,59 


pub 


F 


m2 


s7 


3,13 


2,45 


2,41 


trafficRoad 


F 


f1 


s8 


3,14 


2,43 


2,41 


train 


F 


f1 


s8 


4,12 


3,23 


3,51 
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B.2 Orange test vectors 



A subset of Orange validation database, comprised of the three scores [S-MOS, N-MOS, G-MOS] and the associated 
audio material [Clean, Noisy Input, Noise-reduced output] for each sample is provided for purposes of validation. This 
subset covers as much as possible the entire quality range and includes samples of conditions 2, 10, 19, 23, 26 and 30 as 
detailed in table B.3. 

The test vectors can be downloaded here: 

http://docbox.etsi.org/stq/Open/TS%20103%20106%20Wave%20files/Annex B2 Orange%20Verification%20Data/ 

Table B.3: Orange test vectors and objective scores to be achieved by 
an objective model implementation 













Per sample 


Per condition 


File name 


Noise 


Device 


talker 


sample 


S- 

MOS 


N- 
MOS 


G-MOS 


S- 
MOS 


N-MOS 


G-MOS 


1bff2s01.c02 


Mensa 


D1 


f2 


s1 


4,04 
3,70 
3,97 
4,13 
4,42 
4,09 


3,79 
4,39 
2,89 
4,08 
4,43 
3,92 


3,68 
3,64 
3,25 
3,87 
4,26 
3,77 4,06 3,92 3,74 


1 bff2s02.c02 


Mensa 


D1 


f2 


s2 


1bfm1s01.c02 


Mensa 


D1 


ml 


si 


1bfm1s02.c02 


Mensa 


D1 


ml 


s2 


1bfm2s01.c02 


Mensa 


D1 


m2 


si 


1bfm2s02.c02 


Mensa 


D1 


m2 


s2 


1bff2s01.c10 


Crossroads 


D4 


f2 


si 


3,59 
3,87 
3,67 
3,71 
3,80 
3,77 


3,15 
3,68 
2,98 
3,45 
2,82 
3,02 


3,05 
3,49 
3,04 
3,26 
3,07 
3,14 3,74 3,18 3,18 


1 bff2s02.c1 


Crossroads 


D4 


f2 


s2 


1bfm1s01.c10 


Crossroads 


D4 


ml 


si 


1bfm1s02.c10 


Crossroads 


D4 


ml 


s2 


1bfm2s01.c10 


Crossroads 


D4 


m2 


si 


1bfm2s02.c10 


Crossroads 


D4 


m2 


s2 


1bff2s01.c19 


No noise 


Source 


f2 


si 


4,75 
4,74 
4,76 
4,76 
4,72 
4,75 


4,64 
4,39 
3,47 
4,43 
4,63 
4,26 


4,40 
4,40 
4,19 
4,40 
4,40 
4,40 4,75 4,30 4,37 


1bff2s02.c19 


No noise 


Source 


f2 


s2 


1bfm1s01.c19 


No noise 


Source 


ml 


si 


1bfm1s02.c19 


No noise 


Source 


ml 


s2 


1bfm2s01.c19 


No noise 


Source 


m2 


s1 


1bfm2s02.c19 


No noise 


Source 


m2 


s2 


1bff2s01.c23 


No noise 


NS Level 1 


12 


si 


2,32 
2,58 
2,65 
2,33 
2,30 
2,69 


4,67 
4,75 
4,55 
4,70 
4,63 
4,73 


2,78 
2,98 
2,94 
2,80 
2,75 
3,04 2,48 4,67 2,88 


1 bff2s02.c23 


No noise 


NS Level 1 


f2 


s2 


1bfm1s01.c23 


No noise 


NS Level 1 


ml 


si 


1bfm1s02.c23 


No noise 


NS Level 1 


ml 


s2 


1bfm2s01.c23 


No noise 


NS Level 1 


m2 


si 


1 bfm2s02.c23 


No noise 


NS Level 1 


m2 


s2 


1bff2s01.c26 


Crossroads 


Source 


f2 


si 


4,76 
4,74 
4,72 
4,76 
4,67 
4,72 


2,48 
2,40 
2,35 
2,36 
2,45 
2,48 


3,77 
3,73 
3,69 
3,73 
3,68 
3,74 4,73 2,42 3,72 


1 bff2s02.c26 


Crossroads 


Source 


12 


s2 


1bfm1s01.c26 


Crossroads 


Source 


ml 


si 


1bfm1s02.c26 


Crossroads 


Source 


ml 


s2 


1bfm2s01.c26 


Crossroads 


Source 


m2 


si 


1bfm2s02.c26 


Crossroads 


Source 


m2 


s2 


1bff2s01.c30 


Crossroads 


NS Level 1 


12 


s1 


3,36 
3,20 
2,79 
3,29 
2,08 
2,94 


1,95 
2,02 
1,97 
2,01 
1,94 
2,08 


2,37 
2,28 
1,97 
2,34 
1,52 


1bff2s02.c30 


Crossroads 


NS Level 1 


12 


s2 


1bfm1s01.c30 


Crossroads 


NS Level 1 J 


ml 


si 


1bfm1s02.c30 


Crossroads 


NS Level 1 


ml 


s2 


1bfm2s01.c30 


Crossroads 


NS Level 1 1 


m2 


si 


1bfm2s02.c30 


Crossroads 


NS Level 1 


m2 


s2 


2,12 


2,94 


2,00 


2,10 1 
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Annex C (normative): 

Speech material to be used for objective testing 

The following speech samples are used in conjunction with the model: 4 talkers (2 Males/2 Females), 8 Harvard 
sentences per talker, each sample is 4 s duration. 

The first 4 sentences are used during the adaptation period of the noise canceller under test, the remaining 16 samples 
are used for calculating the objective scores. 

The speech samples can be downloaded here: 

http://docbox.etsi.org/stq/Open/TS%20103%20106%20Wave%20files/Annex C Dvnastat%20Speech%20Data/ 



Seq 


Sample 


Harvard Sentence 




1 


m1s8 


We tried to replace the coin but failed. 


^8 

- a> 


2 


f1s8 


A rod is used to catcli pink salmon. 


3 


m2s8 


Corn cobs can be used to kindle a fire. 


4 


f2s8 


The crooked maze failed to fool the mouse. 


5 


misl 


The empty flask stood on the tin tray. 




6 


fisl 


It is easy to tell the depth of a well. 




7 


m2s1 


Acid burns holes in wool cloth. 




8 


f2s1 


Note closely the size of the gas tank. 




9 


m1s2 


He broke a new shoelace that day. 




10 


f1s2 


The box was thrown beside the parked truck. 




11 


m2s2 


Eight miles of woodland burned to waste. 




12 


f2s2 


IVlend the coat before you go out. 




13 


m1s3 


The urge to write short stories is rare. 




14 


f1s3 


Four hours of steady work faced us. 




15 


m2s3 


A young child should not suffer fright. 




16 


f2s3 


The stray cat gave birth to kittens. 




17 


m1s4 


The pirates seized the crew of the lost ship. 




18 


f1s4 


The boy was there when the sun rose. 




19 


m2s4 


The fruit of a fig tree is apple shaped. 




20 


f2s4 


The frosty air passed through the coat. 





ETSI 



49 ETSI TS 1 03 1 06 V1 .2.1 (201 3-03) 



Annex D (informative): 

Subjective testing framework used for the present document 



D.1 Introduction 



This annex describes the framework for conducting subjective testing used for the validation of the model described in 
the present document. Such a framework is seen as necessary in order to minimize variations between subjective tests 
performed in different listening laboratories. 



D.2 Subjective test plan 
D.2.1 Traceability 



The subjective test method is described in Recommendation ITU-T P. 835 [i.6] and the ITU-T Handbook of subjective 
testing practical procedures [i.20], with the following observations: 

D.2.2 Speech database requirements 

The source speech database (near end signal) to be used for data collection and listening tests needs to consist of at least 
8 samples (2 male and 2 female talkers, 2 samples per talker). 

The speech material needs to conform to the guidelines specified in the ITU-T handbook of subjective testing practical 
procedures, section 5, and section B.3 of Recommendation ITU-T P. 501 [i.l3]. Each sample needs to be constructed 
according to the guidelines described in Recommendation ITU-T P. 835 [i.6] section 5.1.4 (including 1 s of leading and 
1 s of trailing silence) and normalized to an active speech level [i.8] of -26 dBov. It is recommended that the source 
speech material be 16 bit / 48kHz. 

D.2.3 Reference Conditions 

Reference conditions need to follow the proposal in [i.21], which incorporates a spectral subtraction based distortion 
instead of the MNRU-based distortion typically used in subjective tests. The conditions used for the new SIG reference 
system and specification for NS Levels are listed in tables D.l and D.2. 

D.2.4 Test Conditions 

Test conditions need to be recorded from real handset devices or from mock-up terminals for offline processing as 
described in clause 3. Table D.l lists the recommended test conditions used for the recordings and listening tests. At 
least 6 out of the 8 noise types described should be included in the test to provide similarity of context between different 
labs. 2 of the 8 noise types can be replaced by either a clean speech transmission scenario (i.e. the background noise 
reproduction is disabled) or other noise types taken from the ES 202 396-1 [i.l9] database (except for the Male Single 
Voice Distractor noise type, see note 1). 

NOTE 1 : As speech and music carry contextual information, they can be viewed as a separate class of distractors 
and more study was felt necessary for their inclusion. 

Either handset, headset or handheld hands-free usage modes are acceptable. The inclusion of hands-free test and headset 
cases is optional and intended to span a larger range of degradations for the purposes of re-training of the objective 
predictor model. 

NOTE 2: During the derivation of the training databases it was found that relatively few conditions had SIG scores 
below 3.0. For the validation stage it is desired that the tests also include conditions that cover a broader 
range of the SIG scale, while also achieving a good distribution of BAK and OVRL scores. This can be 
accomplished by one (or a combination of) the following means: 

■ A pre-selection of UEs with known poor speech quality for use in the test (preferred) 

■ A higher proportion of UEs operating in hand-held hands -free mode 
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■ Use of lower speech codec bit-rates (as these were not part of the training databases, the lower 

speech codec bit-rates will be only considered in the validation, in situations in which the model is 
capable of handling these types of impairments ). 

Table D.1 : Test and Reference conditions for narrowband subjective evaluation of noise reduction 



Reference Conditions 


File 


SIG. 


SNR 


Noise Type 




i01 


Source (filtered) 


No Noise 


- 




i02 


Source (filtered) 


OdB 


Fullsize Carl 130Kmh binaural 




i03 


Source (filtered) 


12dB 


Fullsize Carl 130Kmh binaural 




i04 


Source (filtered) 


24 dB 


Fullsize Carl 130Kmh binaural 




i05 


Source (filtered) 


36 dB 


Fullsize Carl 130Kmh binaural 




i06 


NS Level 1 


No Noise 


- 




i07 


NS Level 2 


No Noise 


- 




i08 


NS Level 3 


No Noise 


- 




i09 


NS Level 4 


No Noise 


- 




i10 


NS Level 3 


24 dB 


Fullsize Carl 130Kmh binaural 




111 


NS Level 2 


12dB 


Fullsize Carl 130Kmh binaural 




i12 


NS Level 1 


[OdB] 


Fullsize Carl 130Kmh binaural 




Test Conditions 


File 


Speech level 

@MRP 

Handset/ 

handsfree 


Noise level 

@ HATS ear simulators with 

ID correction 


Noise Type 


Description of Noise from EG 202 
396-1 [i.2] 


i13 


-1,7/+1,3dBPa 


L: 75,0 dB(A) / R: 73,0 
dB(A) 


Pub_Noise_binaural_V2 


Recording in a pub 


i14 


-1,7/+1,3dBPa 


L: 74,9 dB(A) / R: 73,9 
dB(A) 


Outside_Traffic_Road_binaural 


Recording at pavement 


i15 


-1,7/+1,3dBPa 


L:69,1 dB(A)/R:69,6 
dB(A) 


Outside_Traffic_Crossroads_binaural 


Recording at pavement 


116 


-1,7/+1,3dBPa 


L: 68.2dB(A) / R:69,8 dB(A) 


Train Station binaural 


Recording at departure platform 


i17 


-1,7/+1,3dBPa 


L:69,1 dB(A)/R:68,1 
dB(A) 


Fullsize_Car1_1 30Kmh_binaural 


Recording in passenger cabin 


i18 


-1,7/+1,3dBPa 


L: 68,4 dB(A)/R: 67,3 
dB(A) 


Cafeteria_Noise_binaural 


Recording at sales counter 


i19 


-1,7/+1,3dBPa 


L: 63,4 dB(A)/R: 61,9 
dB(A) 


Mensa_binaural 


Recording in a cafeteria 


120 


-1,7/+1,3dBPa 


L: 56,6 dB(A) / R: 57,8 
dB(A) 


Work Noise Office Callcenter binau 
ral 


Recording in a business office 



Table D.2: Test and Reference conditions for wideband subjective evaluation of noise reduction 



Reference Conditions 


File 


SIG. 


SNR 


Noise Type 


101 


Source (filtered) 


No Noise 


- 


102 


Source (filtered) 


10dB 


Outside Traffic Crossroads binaural 


103 


Source (filtered) 


20 dB 


Outside Traffic Crossroads binaural 


104 


Source (filtered) 


30 dB 


Outside Traffic Crossroads binaural 


105 


Source (filtered) 


40 dB 


Outside Traffic Crossroads binaural 


106 


NS Level 1 , 2"" set of parameters 


No Noise 


- 


107 


NS Level 2, 2"" set of parameters 


No Noise 


- 


108 


NS Level 3, 2"" set of parameters 


No Noise 


- 


109 


NS Level 4, 2"° set of parameters 


No Noise 


- 


ilO 


NS Level 3, 2™ set of parameters 


30 dB 


Outside Traffic Crossroads binaural 


ill 


NS Level 2, 2™ set of parameters 


20 dB 


Outside Traffic Crossroads binaural 


112 


NS Level 1 , 2"" set of parameters 


10dB 


Outside Traffic Crossroads binaural 
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D.2.5 Pre-processing of reference conditions 

For the reference conditions, the clean speech and noise signals need to be filtered with the LP35 and MSIN (for 
narrowband) and 78 KBP (for wideband) filters available with a modified "filter" demo program from 
Recommendation ITU-T G.191 [i.l 1] (available from ORANGE SA upon request). Appropriate resampling needs to be 
used prior to application of the filters. The necessary upsampling / downsampling are also performed through the use of 
Recommendation ITU-T G.191 [i.l 1] "filter" demo program. The clean speech is then processed with the spectral 
subtraction algorithm in annex A at the appropriate settings and, prior to mixing, normalized to an active speech level of 
-26 dBov. The mixing needs to be performed with the appropriate Recommendation ITU-T G.191 [i. 1 1 ] tool to obtain 
the SNRs described in table D.l. The SNR is defined as the ratio between active speech levels to A-weighted noise 
level, for more details on NS Levels, see [i.22]. Clause D.4 shows a block diagram of the necessary processing steps. 

D.2.6 Post-processing of test conditions 

The uplink recordings of processed speech materials be normalized for use in the subjective tests. For the test 
conditions, the normalization gain is the gain necessary to obtain a recorded active speech level of -26 dBov with a 
clean speech condition (no noise applied in the room). As a result, this normalization gain needsto be applied to all 
other test conditions for the same device (noise suppressed speech signals). In this way, the effect of level changes 
introduced by terminals in the presence of noise needs to be part of the quality measurement. 

D.2.7 Calibration and equalization of headphones for presentation 

Headphones used for presentation of the test material to the listening panel should be calibrated and equalized using a 
HATS conforming to Recommendation ITU-T P. 58 [i.l4] and an artificial ear type 3.3 according to Recommendation 
ITU-T P.57 [i.l5]. The HATS is diffuse field equalized. The resulting frequency response characteristic of the 
headphones used in the subjective experiments needs to be within the mask given in TS 126 131 [i.l6], clause 6.4.2. 

The presentation of the test and reference conditions to listeners needs to be diotic. The system gain is adjusted so that a 
speech segment of -26 dBov corresponds to a presentation level of 73 dB SPL measured at the DRP with diffuse-field 
equalization. 

D.2.8 Requirements on the listening laboratory 

Listening laboratory facilities need to comply with the recommendations provided in Recommendation ITU-T 
P.800 [i.l7]. 

D.2.9 Experimental design 

The use of the Balanced Blocks experimental design described in [i.20], section 3.3.2 is recommended. The 
experimental design needs to include the 12 reference conditions and 8 test conditions per device under test, described 
in table D. 1 (alternatively, for wideband, table D.2 can be used) . A minimum of two and a maximum of six devices 
needs to be included in any one test. 

The test and reference conditions should be reported for a total of 32 naive listeners. The listeners need to be native 
speakers of the language used for the test. 

128 votes per condition need to be obtained. The number of votes per sample will depend on the number of samples per 
talker chosen. A minimum of 2 samples per talker and 8 votes per sample needs to be used. 



D.2. 10 Training session 



Prior to administration of the test, subjects need to be provided with written instructions on the test procedures. The use 
of training materials (e.g. videos, presentations) is encouraged to ensure the participants fully understand the task being 
requested. The training session needs to be followed by a practice session containing 16 trials. The practice session 
needs to include conditions representative of those presented in the test. An example is provided below: 
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Trial 


Sample 


Condition 


1 


m1s3.r01 


Reference - Source/No noise 


2 


f2s1.x06 


Test - Cafeteria 


3 


m2s4.r1 1 


Reference - NS-L2/12dB SNR 


4 


f1s1.r02 


Reference - Source/OdB SNR 


5 


m2s3.x03 


Test - Traffic-crossroads 


6 


f1s1.x05 


Test - Fullsize car 


7 


m2s1.r07 


Reference - NS-L2/No noise 


8 


f2s2.x02 


Traffic-road 


9 


m2s2.r03 


Reference - Source/1 2dB SNR 


10 


f2s2.r06 


Reference - NSLI/N0 noise 


11 


m2s4.x01 


Pub 


12 


f2s3.x08 


Test - Call-center 


13 


m2s4.r04 


Reference - Source/24dB SNR 


14 


f2s1.x04 


Test - Train station 


15 


m2s3.r12 


Reference -NS-LI/OdB SNR 


16 


f2s3.x07 


Test - Mensa babble 



NOTE: X is a device outside the set of DUTs. 



D.3 Set-up for acquisition of test conditions 
D.3.1 Terminal positioning and HATS calibration 

For reproduction of the near-end signal, a HATS conforming to Recommendation ITU-T P.58 [i.l4] is used. The mouth 
simulator needs to be equalized to achieve the reproduction accuracy described in TS 126 132 [i.l8], clause 5.3. 

For handset and headset mode testing, the mouth sensitivity gain needs to be adjusted to produce an active speech level 
of -1,7 dBPa at MRP for a -26 dBov input speech signal. 

The handset terminals or mock-ups under test need to be set-up on HATS and the handset mounting position 
documented as described in TS 126 132 [i.l8] clause 5.1.1. 

Headsets need to be set-up on HATS as described in TS 126 132 [i.18], clause 5.1.2. 

For handheld hands-free mode the device is set-up using HATS as described in TS 126 132 [i.l8], clause 5.1.3.3. 

For handheld hands-free mode testing, the mouth sensitivity gain needs to be adjusted to produce an active speech level 
of H- 1,3 dBPa at MRP for a -26 dBov input speech signal. 

D.3. 2 Background Noise reproduction 

The background noise reproduction system needs to be set-up and equalized according to ES 202 396-1 [i.l9]. Noise 
types need to be reproduced at their realistic levels according to ES 202 396-1 [i.l9] clause 8. The test conditions and 
noise files are specified in table D.l. 

D.3. 3 Noise and speech playback synchronization 

The noise and speech playback needs to be be time aligned and synchronized. This is generally the case when playing 
the noise and speech files out of multiple channels of a same hardware interface but appropriate synchronization needs 
to be be ensured when using separate hardware for noise and speech playback. 



D.3. 4 Convergence sequence 



For proper convergence of terminal noise suppression the following time sequencing should be applied: 
1) the terminal is set-up and a call is established in noise free conditions; 
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2) 2 seconds of noise only is applied in the test room with a linear amplitude fade-in from to 2 seconds (noise 
ramp-up period), immediately followed by; 

3) 6 seconds of noise only, immediately followed by; 

4) 16 seconds (4 samples) of simultaneous speech and noise, immediately followed by; 

5) actual test material to be used for listening panel presentation. 

To allow for proper convergence of noise suppressors, an additional four sentences need to be placed immediately prior 
to the 32 sentences used in subjective testing. 

D.3.5 Example of noise and speech playback sequence including 
convergence period 

Figure D. 1 illustrates an example of a playback time history for speech and one particular noise signal (Fullsize car 
1 130 km/h binaural). The following applies to the example in figure D.l: 

1) The speech signal is constructed by concatenating 8 seconds of silence with 36 speech samples of 4 s each. 
The total length is therefore 152 seconds. The first 24 seconds are used for convergence of the noise 
suppression algorithm and not used for the purposes of listening panel presentation. 

2) The noise signal is constructed by concatenating 6 repetitions of a noise sample and the first 8 seconds of the 
7* repetition. The noise sample is cut out, or generated from, the original noise file in the EG 202 396-1 [i.2] 
database to be 24 s in length and fade-in and fade-out processing is applied to the first and last 50 samples 
(assuming noise at 48 kHz sampling rate) to ensure zero-crossing of the signal amplitude at beginning and end 
of the sample. A linear fade-in is applied to the first 2 seconds of the concatenated noise signal, as this was 
found necessary for proper convergence of some terminals. 

It is noted that by looping the noise every 24 seconds (a multiple of the speech sample length of 4 s) the sharp 
transitions in the noise amplitude at the looping point coincide with the location of sample cutting for listening panel 
presentation. This avoids audible sharp transitions to fall during a speech segment. 



rP.BmD[k-rmUwiU^-rm2swi2^^wm^^.^^^wm2i\-rf2s2wmUwnsiwm2^wrisiwm^^-ri^^wm2s'wi2s4wm^s.'riU5wm2s.-rns5wm^^-riUi-rm2st-^ 




Figure D.1 : Noise and speech playbacic sequence, including convergence period 
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D.3.6 Recordings at the network simulator electrical reference 
point 

The network simulator needs to be set to a WCDMA voice call with the AMR 12,2 kbps speech codec in narrowband 
tests and AMR-WB 12,65 kbps speech codec in wideband tests. DTX needs to be be enabled. The send signal is 
recorded at the electrical reference point of a network simulator to generate the test conditions (noise suppressed 
speech) for the subjective test. The send frequency response of the terminals needs to be measured according to 
TS 126 132 [i.l8] clauses 7.4.1, 7.4.3, 8.4.1 or 8.4.3 (depending on the testing condition being performed) and the 
results documented. 

D.3.7 Recordings at the IVIRP and terminal's primary microphone 
location 

In addition to the recordings at the electrical reference point of a network simulator, the acoustic signals at MRP and 
primary microphone position can be recorded for further reference and use on objective predictor retraining. 
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D.4 Processing test plan block diagram 



( Sount:e_Speech.wav 
t 16bit, ijekHz, -aedEov 



Ful l£ize_Car1 _1 aOKriVh 
binaural .uwav 
lEbit, 4BkHz 



ConveniQ *.rsw 



ConveniB *.rgw 



Convert to mono 
stBcaosp -mocK) 



Do*niMmpl& 10 16KHz 



Dorwns^mpla lo iSKliz 
filter VaSi 



Band-pass with 7SKBP 
flllMr 7aKBF 



Band-pass With TSKBP 
filter 7&KBP 



Normaliie to -2BdBov 




iOI -SRC.raw 




1 


' 


ProcMS with NSLVL 


HSLevels 1 


NSLevels 2 


NSIjevels 3 


J 





PfDCSEE with NSLVL 



Normallzjeto-SiedBo^ 
sv5£d€!nio 



Scab to achieve 
desired SNR 

ae&l± DdBA ShlR 
scBla 12dGASMR 
scale 24dBA€HR 
scale 3«dBA SHR 



i09-MSLVL4.raiw 




iOe-NSLVL1 raw 
iD7 - NSLVl:2.r3w 
iOe - NSLVLa.rHW 



U\ii. Signals al d^simci SNR 
add OdBASMR 
add 12dBA5MR 
91^ 24dBASMR 
add 3«clBA€NR 



Mix&ignBl&dt desi r^ SIvJ R 
pi^ DdBASMR 
add 12dBASWR 
add 24dBA SWR 



Normallzie to -^dBohn 
svSSdemo 




r 



i12- 3RC_^ISLVL1_[ldBA_S^IR.^aw 
l11 - SRC_HSLVL2_l2dBA_SNR.raw 
no ■ SRC HSLVL3 24dBA SNR.raw 



(i02 - SRC_OdBA_SNR.raw 
i03 - 3RC_12dBA_3MR.raw 
i04 - SRC_2^dBA_3MR.raw 
105- SRC_36d6A_SMR.raw 



Figure D.2: Wideband processing for reference set 
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t Source_Speec:h.wav \ 
t lebit, JBI(Hz.-2SdBav y 



FLMsize_Car1_13DKiTVh 
binaurj3l.ui/HV 
lEbit, 4BkHz 



Coiwenw *,raw 



Low-pass with LP3& 
filter LP3& 



Convert w *.raw 



Convert to mono 



Dowrts^mpl» Is IBtliz 
filtQE EIQ3 



High-pass wrilrt MSIN 
filter HSin 



Low-pass With LP7 
filter LP3& 



Dowti&atnpla lo IShHz 
inter KiJ3 



Downsampte to BIsHz 
fil^E VSi 



High-pass wiihui SIN 

filter HSIti 



Normallzjelo-SfidBov 
sv5£kleino 




i01 - SRCfhw 




Process w*i NSLVL 
NSLevela 1 
HSLevels 2 
NSIjevela 3 



Nomiallzeto-2ficlBQ4 
ffvA^desio 



Dcwnsample to BkKz 

£llt*r HQ2 



PtoMBB with N3LVL 
tlSL*v*l* 4 



Normalizielo-^edBoi/ 
sv5£dana 



Scale to achieve 

desired SNR 
acale DdBA SNR 
scBla 12dGASMR 
sola ?4dBASIJR 
scale aeclBASMR 



i0g-NSLVL4.raiM 




iM-NSLVL1 raw 
iD7 - NSLVL2.raw 
iOa - NSLVU-r^w 



Mix: sigrtals al d^sir^d SNR 
add OclBASUR 
Add 1 2d B A SNR 
ni^ 24dBASMR 
add MdEASHR 



U Ik signals at dasi rad EM R 
)iij4 DdBASMR 
add 12()BAStJR 
add 24a BA SHR 



Nomiallzielo-2£dBov 
sv5£deano 



Normallzielo-^eclBo^ 
sv&6da3Tio 






i12- SRC_rJELVL1_DdBA_SNR.raw 
111 - &RC_HSLVL2_l2aBA_SNR.raw 
no -SRC USLVL3 24dBA SNR.raiv 



iD2 - 3RC_0dBA_SNR.raw 
i03 - 3RC_1ZdBA_3NR.raw 
iM - SRC_24dBA_SMR.raw 
105 - SRC 36dBA SMR.raw 



Figure D.3: Narrowband processing for reference set 
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